Erin Bennett, Judith Degen, Michael Henry Tessler, Justine Kao & Noah D. Goodman (Stanford)

motivation

point of departure

relevance

  • belief + intention + rationality \(\Rightarrow\) action

 

problem

  • beliefs are latent, only act is directly observable

     

INSERT A COOL PICTURE??

mundane beliefs may matter

sentence

Joe eats many burgers.

 

cardinal surprise reading

Joe eats more burgers that we would expect of him.

     

INSERT A COOL PICTURE

(e.g., Schoeller & Franke 2015)

how to determine beliefs?

real-world frequencies

  • may not exist or may not be known

 

experimental measures

  • give-number task & inference of parameterized distribution
    • e.g., Manski (2004); Tauber & Steyvers (2013)
    • scoring rules
  • binned histogram slider rating task (Kao et al., 2014)

binned histogram slider ratings

manyPriors

take average normalized slider ratings to reflect population-level belief

agenda

goal: scrutinize BH task

  • relate subjects' answers, their beliefs & population-level aggregate

dummy

approach: hieararchical Bayesian modeling

  • data from multiple task types (within-subject)
  • infer latent subjective & "population-level beliefs"
    • "population-level belief" => central tendency of subjective beliefs
    • think: "mean of a Gaussian hyperprior"

experiment

overview

  • 50 participants recruited via MTurk
  • each saw every condition of every task
  • 8 items (from previous research)
  • 3 task types:
    • BH: binned histogram
    • GAN: give-a-number
    • PC: paired comparison

items

  1. "X has just fetched himself a cup of coffee from the office vending machine."
    • "What do you think the temperature of his coffee is?"
  2. "X commuted to work yesterday."
    • "How many minutes do you think she spent commuting yesterday?"
  3. "X told a joke to N kids."
    • "How many of the kids do you think laughed?"
  4. "X bought a laptop."
    • "How much do you think it cost?"
  5. "X threw N marbles into a pool."
    • "How many of the marbles do you think sank?"
  6. "X just went to the movies to see a blockbuster."
    • "How many minutes long do you think the movie was?"
  7. "X watched TV last week."
    • "How many hours do you think he spent watching TV last week?"
  8. "X bought a watch."
    • "How much do you think it cost?"

BH task

dummy

priorsslider

GAN task

dummy

priorsnumbers

PC task

dummy

priorslightning

results

slider ratings

dataslider

number estimates

dataslider

lightning round

dataslider

Bayesian inference

questions

general

  • do average BH ratings approximate latent population-level averages well?

 

specific

  • relation: choice behavior, subjective & population-level beliefs

model

modelGraph

set-up

  • implemented in JAGS
  • 50,000 samples after a burn in of 100,000
  • convergence checks: visually and \(\hat{R}\)

population-level beliefs \(Q_j\)

postPriors

red: averaged normal. slider ratings; black: mean posterior \(Q_i\) with 95% HDIs

individual vs. population-level beliefs

postSubjPriors

black: mean posterior \(Q_j\) with 95% HDIs; dark gray: mean posterior \(P_{ij}\)

upshot

 

  • subjective beliefs differ from population-level mean (good!)
  • avrgd normlzd slider ratings reasonably approximate mean \(Q_j\) (excellent!)

model criticism

PPC averaged normalized slider

ppcSlider

avrgd normlzd slider ratings we would expect from the model and the posterior distribution over parameters is virtually indistinguishable from the observed

PPC number choice

ppcNumber

some frequencies of number choices are suprising for the trained model (but: little data to go with; round or salient numbers may play a role)

PPC lightning round

ppcChoice

miserable failure to predict why it should be more likely that no marble sank than that one marble sank (alternative explanation: subjects revise beliefs, assume homogeneous "wonkiness" of marbles)

posterior predictive p-values

posteriorP

conclusions

conclusions

dummy

  • avrgd normlzd slider ratings appear to be practical and reliable measures

dummy

  • slider ratings closely track pop-level central tendency of individual beliefs

dummy

  • hierarchical modeling of population-level beliefs is possible

dummy

  • link functions for task choices seem reliable (enough)

modeling details

model

modelGraph

hierarchical population prior

  • \(w \sim \text{Gamma}(2,0.1)\)
  • \(Q_{j} \sim \text{Dirichlet}(1,\dots, 1)\)
  • \(P_{ij} \sim \text{Dirichlet}(w Q_j)\)

dummy

w = 20

w = 200

link function: sliders

link function: numbers

link function: lightning